{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\nChatbot Tutorial\n================\n**Author:** `Matthew Inkawhich <https://github.com/MatthewInkawhich>`_\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In this tutorial, we explore a fun and interesting use-case of recurrent\nsequence-to-sequence models. We will train a simple chatbot using movie\nscripts from the `Cornell Movie-Dialogs\nCorpus <https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html>`__.\n\nConversational models are a hot topic in artificial intelligence\nresearch. Chatbots can be found in a variety of settings, including\ncustomer service applications and online helpdesks. These bots are often\npowered by retrieval-based models, which output predefined responses to\nquestions of certain forms. In a highly restricted domain like a\ncompany\u2019s IT helpdesk, these models may be sufficient, however, they are\nnot robust enough for more general use-cases. Teaching a machine to\ncarry out a meaningful conversation with a human in multiple domains is\na research question that is far from solved. Recently, the deep learning\nboom has allowed for powerful generative models like Google\u2019s `Neural\nConversational Model <https://arxiv.org/abs/1506.05869>`__, which marks\na large step towards multi-domain generative conversational models. In\nthis tutorial, we will implement this kind of model in PyTorch.\n\n.. figure:: /_static/img/chatbot/bot.png\n   :align: center\n   :alt: bot\n\n.. code:: python\n\n  > hello?\n  Bot: hello .\n  > where am I?\n  Bot: you re in a hospital .\n  > who are you?\n  Bot: i m a lawyer .\n  > how are you doing?\n  Bot: i m fine .\n  > are you my friend?\n  Bot: no .\n  > you're under arrest\n  Bot: i m trying to help you !\n  > i'm just kidding\n  Bot: i m sorry .\n  > where are you from?\n  Bot: san francisco .\n  > it's time for me to leave\n  Bot: i know .\n  > goodbye\n  Bot: goodbye .\n\n**Tutorial Highlights**\n\n-  Handle loading and preprocessing of `Cornell Movie-Dialogs\n   Corpus <https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html>`__\n   dataset\n-  Implement a sequence-to-sequence model with `Luong attention\n   mechanism(s) <https://arxiv.org/abs/1508.04025>`__\n-  Jointly train encoder and decoder models using mini-batches\n-  Implement greedy-search decoding module\n-  Interact with trained chatbot\n\n**Acknowledgements**\n\nThis tutorial borrows code from the following sources:\n\n1) Yuan-Kuei Wu\u2019s pytorch-chatbot implementation:\n   https://github.com/ywk991112/pytorch-chatbot\n\n2) Sean Robertson\u2019s practical-pytorch seq2seq-translation example:\n   https://github.com/spro/practical-pytorch/tree/master/seq2seq-translation\n\n3) FloydHub\u2019s Cornell Movie Corpus preprocessing code:\n   https://github.com/floydhub/textutil-preprocess-cornell-movie-corpus\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Preparations\n------------\n\nTo start, Download the data ZIP file\n`here <https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html>`__\nand put in a ``data/`` directory under the current directory.\n\nAfter that, let\u2019s import some necessities.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "from __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\nfrom __future__ import unicode_literals\n\nimport torch\nfrom torch.jit import script, trace\nimport torch.nn as nn\nfrom torch import optim\nimport torch.nn.functional as F\nimport csv\nimport random\nimport re\nimport os\nimport unicodedata\nimport codecs\nfrom io import open\nimport itertools\nimport math\n\n\nUSE_CUDA = torch.cuda.is_available()\ndevice = torch.device(\"cuda\" if USE_CUDA else \"cpu\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Load & Preprocess Data\n----------------------\n\nThe next step is to reformat our data file and load the data into\nstructures that we can work with.\n\nThe `Cornell Movie-Dialogs\nCorpus <https://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html>`__\nis a rich dataset of movie character dialog:\n\n-  220,579 conversational exchanges between 10,292 pairs of movie\n   characters\n-  9,035 characters from 617 movies\n-  304,713 total utterances\n\nThis dataset is large and diverse, and there is a great variation of\nlanguage formality, time periods, sentiment, etc. Our hope is that this\ndiversity makes our model robust to many forms of inputs and queries.\n\nFirst, we\u2019ll take a look at some lines of our datafile to see the\noriginal format.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "corpus_name = \"cornell movie-dialogs corpus\"\ncorpus = os.path.join(\"data\", corpus_name)\n\ndef printLines(file, n=10):\n    with open(file, 'rb') as datafile:\n        lines = datafile.readlines()\n    for line in lines[:n]:\n        print(line)\n\nprintLines(os.path.join(corpus, \"movie_lines.txt\"))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Create formatted data file\n~~~~~~~~~~~~~~~~~~~~~~~~~~\n\nFor convenience, we'll create a nicely formatted data file in which each line\ncontains a tab-separated *query sentence* and a *response sentence* pair.\n\nThe following functions facilitate the parsing of the raw\n*movie_lines.txt* data file.\n\n-  ``loadLines`` splits each line of the file into a dictionary of\n   fields (lineID, characterID, movieID, character, text)\n-  ``loadConversations`` groups fields of lines from ``loadLines`` into\n   conversations based on *movie_conversations.txt*\n-  ``extractSentencePairs`` extracts pairs of sentences from\n   conversations\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Splits each line of the file into a dictionary of fields\ndef loadLines(fileName, fields):\n    lines = {}\n    with open(fileName, 'r', encoding='iso-8859-1') as f:\n        for line in f:\n            values = line.split(\" +++$+++ \")\n            # Extract fields\n            lineObj = {}\n            for i, field in enumerate(fields):\n                lineObj[field] = values[i]\n            lines[lineObj['lineID']] = lineObj\n    return lines\n\n\n# Groups fields of lines from `loadLines` into conversations based on *movie_conversations.txt*\ndef loadConversations(fileName, lines, fields):\n    conversations = []\n    with open(fileName, 'r', encoding='iso-8859-1') as f:\n        for line in f:\n            values = line.split(\" +++$+++ \")\n            # Extract fields\n            convObj = {}\n            for i, field in enumerate(fields):\n                convObj[field] = values[i]\n            # Convert string to list (convObj[\"utteranceIDs\"] == \"['L598485', 'L598486', ...]\")\n            lineIds = eval(convObj[\"utteranceIDs\"])\n            # Reassemble lines\n            convObj[\"lines\"] = []\n            for lineId in lineIds:\n                convObj[\"lines\"].append(lines[lineId])\n            conversations.append(convObj)\n    return conversations\n\n\n# Extracts pairs of sentences from conversations\ndef extractSentencePairs(conversations):\n    qa_pairs = []\n    for conversation in conversations:\n        # Iterate over all the lines of the conversation\n        for i in range(len(conversation[\"lines\"]) - 1):  # We ignore the last line (no answer for it)\n            inputLine = conversation[\"lines\"][i][\"text\"].strip()\n            targetLine = conversation[\"lines\"][i+1][\"text\"].strip()\n            # Filter wrong samples (if one of the lists is empty)\n            if inputLine and targetLine:\n                qa_pairs.append([inputLine, targetLine])\n    return qa_pairs"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we\u2019ll call these functions and create the file. We\u2019ll call it\n*formatted_movie_lines.txt*.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Define path to new file\ndatafile = os.path.join(corpus, \"formatted_movie_lines.txt\")\n\ndelimiter = '\\t'\n# Unescape the delimiter\ndelimiter = str(codecs.decode(delimiter, \"unicode_escape\"))\n\n# Initialize lines dict, conversations list, and field ids\nlines = {}\nconversations = []\nMOVIE_LINES_FIELDS = [\"lineID\", \"characterID\", \"movieID\", \"character\", \"text\"]\nMOVIE_CONVERSATIONS_FIELDS = [\"character1ID\", \"character2ID\", \"movieID\", \"utteranceIDs\"]\n\n# Load lines and process conversations\nprint(\"\\nProcessing corpus...\")\nlines = loadLines(os.path.join(corpus, \"movie_lines.txt\"), MOVIE_LINES_FIELDS)\nprint(\"\\nLoading conversations...\")\nconversations = loadConversations(os.path.join(corpus, \"movie_conversations.txt\"),\n                                  lines, MOVIE_CONVERSATIONS_FIELDS)\n\n# Write new csv file\nprint(\"\\nWriting newly formatted file...\")\nwith open(datafile, 'w', encoding='utf-8') as outputfile:\n    writer = csv.writer(outputfile, delimiter=delimiter, lineterminator='\\n')\n    for pair in extractSentencePairs(conversations):\n        writer.writerow(pair)\n\n# Print a sample of lines\nprint(\"\\nSample lines from file:\")\nprintLines(datafile)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Load and trim data\n~~~~~~~~~~~~~~~~~~\n\nOur next order of business is to create a vocabulary and load\nquery/response sentence pairs into memory.\n\nNote that we are dealing with sequences of **words**, which do not have\nan implicit mapping to a discrete numerical space. Thus, we must create\none by mapping each unique word that we encounter in our dataset to an\nindex value.\n\nFor this we define a ``Voc`` class, which keeps a mapping from words to\nindexes, a reverse mapping of indexes to words, a count of each word and\na total word count. The class provides methods for adding a word to the\nvocabulary (``addWord``), adding all words in a sentence\n(``addSentence``) and trimming infrequently seen words (``trim``). More\non trimming later.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Default word tokens\nPAD_token = 0  # Used for padding short sentences\nSOS_token = 1  # Start-of-sentence token\nEOS_token = 2  # End-of-sentence token\n\nclass Voc:\n    def __init__(self, name):\n        self.name = name\n        self.trimmed = False\n        self.word2index = {}\n        self.word2count = {}\n        self.index2word = {PAD_token: \"PAD\", SOS_token: \"SOS\", EOS_token: \"EOS\"}\n        self.num_words = 3  # Count SOS, EOS, PAD\n\n    def addSentence(self, sentence):\n        for word in sentence.split(' '):\n            self.addWord(word)\n\n    def addWord(self, word):\n        if word not in self.word2index:\n            self.word2index[word] = self.num_words\n            self.word2count[word] = 1\n            self.index2word[self.num_words] = word\n            self.num_words += 1\n        else:\n            self.word2count[word] += 1\n\n    # Remove words below a certain count threshold\n    def trim(self, min_count):\n        if self.trimmed:\n            return\n        self.trimmed = True\n\n        keep_words = []\n\n        for k, v in self.word2count.items():\n            if v >= min_count:\n                keep_words.append(k)\n\n        print('keep_words {} / {} = {:.4f}'.format(\n            len(keep_words), len(self.word2index), len(keep_words) / len(self.word2index)\n        ))\n\n        # Reinitialize dictionaries\n        self.word2index = {}\n        self.word2count = {}\n        self.index2word = {PAD_token: \"PAD\", SOS_token: \"SOS\", EOS_token: \"EOS\"}\n        self.num_words = 3 # Count default tokens\n\n        for word in keep_words:\n            self.addWord(word)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now we can assemble our vocabulary and query/response sentence pairs.\nBefore we are ready to use this data, we must perform some\npreprocessing.\n\nFirst, we must convert the Unicode strings to ASCII using\n``unicodeToAscii``. Next, we should convert all letters to lowercase and\ntrim all non-letter characters except for basic punctuation\n(``normalizeString``). Finally, to aid in training convergence, we will\nfilter out sentences with length greater than the ``MAX_LENGTH``\nthreshold (``filterPairs``).\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "MAX_LENGTH = 10  # Maximum sentence length to consider\n\n# Turn a Unicode string to plain ASCII, thanks to\n# https://stackoverflow.com/a/518232/2809427\ndef unicodeToAscii(s):\n    return ''.join(\n        c for c in unicodedata.normalize('NFD', s)\n        if unicodedata.category(c) != 'Mn'\n    )\n\n# Lowercase, trim, and remove non-letter characters\ndef normalizeString(s):\n    s = unicodeToAscii(s.lower().strip())\n    s = re.sub(r\"([.!?])\", r\" \\1\", s)\n    s = re.sub(r\"[^a-zA-Z.!?]+\", r\" \", s)\n    s = re.sub(r\"\\s+\", r\" \", s).strip()\n    return s\n\n# Read query/response pairs and return a voc object\ndef readVocs(datafile, corpus_name):\n    print(\"Reading lines...\")\n    # Read the file and split into lines\n    lines = open(datafile, encoding='utf-8').\\\n        read().strip().split('\\n')\n    # Split every line into pairs and normalize\n    pairs = [[normalizeString(s) for s in l.split('\\t')] for l in lines]\n    voc = Voc(corpus_name)\n    return voc, pairs\n\n# Returns True iff both sentences in a pair 'p' are under the MAX_LENGTH threshold\ndef filterPair(p):\n    # Input sequences need to preserve the last word for EOS token\n    return len(p[0].split(' ')) < MAX_LENGTH and len(p[1].split(' ')) < MAX_LENGTH\n\n# Filter pairs using filterPair condition\ndef filterPairs(pairs):\n    return [pair for pair in pairs if filterPair(pair)]\n\n# Using the functions defined above, return a populated voc object and pairs list\ndef loadPrepareData(corpus, corpus_name, datafile, save_dir):\n    print(\"Start preparing training data ...\")\n    voc, pairs = readVocs(datafile, corpus_name)\n    print(\"Read {!s} sentence pairs\".format(len(pairs)))\n    pairs = filterPairs(pairs)\n    print(\"Trimmed to {!s} sentence pairs\".format(len(pairs)))\n    print(\"Counting words...\")\n    for pair in pairs:\n        voc.addSentence(pair[0])\n        voc.addSentence(pair[1])\n    print(\"Counted words:\", voc.num_words)\n    return voc, pairs\n\n\n# Load/Assemble voc and pairs\nsave_dir = os.path.join(\"data\", \"save\")\nvoc, pairs = loadPrepareData(corpus, corpus_name, datafile, save_dir)\n# Print some pairs to validate\nprint(\"\\npairs:\")\nfor pair in pairs[:10]:\n    print(pair)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Another tactic that is beneficial to achieving faster convergence during\ntraining is trimming rarely used words out of our vocabulary. Decreasing\nthe feature space will also soften the difficulty of the function that\nthe model must learn to approximate. We will do this as a two-step\nprocess:\n\n1) Trim words used under ``MIN_COUNT`` threshold using the ``voc.trim``\n   function.\n\n2) Filter out pairs with trimmed words.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "MIN_COUNT = 3    # Minimum word count threshold for trimming\n\ndef trimRareWords(voc, pairs, MIN_COUNT):\n    # Trim words used under the MIN_COUNT from the voc\n    voc.trim(MIN_COUNT)\n    # Filter out pairs with trimmed words\n    keep_pairs = []\n    for pair in pairs:\n        input_sentence = pair[0]\n        output_sentence = pair[1]\n        keep_input = True\n        keep_output = True\n        # Check input sentence\n        for word in input_sentence.split(' '):\n            if word not in voc.word2index:\n                keep_input = False\n                break\n        # Check output sentence\n        for word in output_sentence.split(' '):\n            if word not in voc.word2index:\n                keep_output = False\n                break\n\n        # Only keep pairs that do not contain trimmed word(s) in their input or output sentence\n        if keep_input and keep_output:\n            keep_pairs.append(pair)\n\n    print(\"Trimmed from {} pairs to {}, {:.4f} of total\".format(len(pairs), len(keep_pairs), len(keep_pairs) / len(pairs)))\n    return keep_pairs\n\n\n# Trim voc and pairs\npairs = trimRareWords(voc, pairs, MIN_COUNT)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Prepare Data for Models\n-----------------------\n\nAlthough we have put a great deal of effort into preparing and massaging our\ndata into a nice vocabulary object and list of sentence pairs, our models\nwill ultimately expect numerical torch tensors as inputs. One way to\nprepare the processed data for the models can be found in the `seq2seq\ntranslation\ntutorial <https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html>`__.\nIn that tutorial, we use a batch size of 1, meaning that all we have to\ndo is convert the words in our sentence pairs to their corresponding\nindexes from the vocabulary and feed this to the models.\n\nHowever, if you\u2019re interested in speeding up training and/or would like\nto leverage GPU parallelization capabilities, you will need to train\nwith mini-batches.\n\nUsing mini-batches also means that we must be mindful of the variation\nof sentence length in our batches. To accomodate sentences of different\nsizes in the same batch, we will make our batched input tensor of shape\n*(max_length, batch_size)*, where sentences shorter than the\n*max_length* are zero padded after an *EOS_token*.\n\nIf we simply convert our English sentences to tensors by converting\nwords to their indexes(\\ ``indexesFromSentence``) and zero-pad, our\ntensor would have shape *(batch_size, max_length)* and indexing the\nfirst dimension would return a full sequence across all time-steps.\nHowever, we need to be able to index our batch along time, and across\nall sequences in the batch. Therefore, we transpose our input batch\nshape to *(max_length, batch_size)*, so that indexing across the first\ndimension returns a time step across all sentences in the batch. We\nhandle this transpose implicitly in the ``zeroPadding`` function.\n\n.. figure:: /_static/img/chatbot/seq2seq_batches.png\n   :align: center\n   :alt: batches\n\nThe ``inputVar`` function handles the process of converting sentences to\ntensor, ultimately creating a correctly shaped zero-padded tensor. It\nalso returns a tensor of ``lengths`` for each of the sequences in the\nbatch which will be passed to our decoder later.\n\nThe ``outputVar`` function performs a similar function to ``inputVar``,\nbut instead of returning a ``lengths`` tensor, it returns a binary mask\ntensor and a maximum target sentence length. The binary mask tensor has\nthe same shape as the output target tensor, but every element that is a\n*PAD_token* is 0 and all others are 1.\n\n``batch2TrainData`` simply takes a bunch of pairs and returns the input\nand target tensors using the aforementioned functions.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def indexesFromSentence(voc, sentence):\n    return [voc.word2index[word] for word in sentence.split(' ')] + [EOS_token]\n\n\ndef zeroPadding(l, fillvalue=PAD_token):\n    return list(itertools.zip_longest(*l, fillvalue=fillvalue))\n\ndef binaryMatrix(l, value=PAD_token):\n    m = []\n    for i, seq in enumerate(l):\n        m.append([])\n        for token in seq:\n            if token == PAD_token:\n                m[i].append(0)\n            else:\n                m[i].append(1)\n    return m\n\n# Returns padded input sequence tensor and lengths\ndef inputVar(l, voc):\n    indexes_batch = [indexesFromSentence(voc, sentence) for sentence in l]\n    lengths = torch.tensor([len(indexes) for indexes in indexes_batch])\n    padList = zeroPadding(indexes_batch)\n    padVar = torch.LongTensor(padList)\n    return padVar, lengths\n\n# Returns padded target sequence tensor, padding mask, and max target length\ndef outputVar(l, voc):\n    indexes_batch = [indexesFromSentence(voc, sentence) for sentence in l]\n    max_target_len = max([len(indexes) for indexes in indexes_batch])\n    padList = zeroPadding(indexes_batch)\n    mask = binaryMatrix(padList)\n    mask = torch.ByteTensor(mask)\n    padVar = torch.LongTensor(padList)\n    return padVar, mask, max_target_len\n\n# Returns all items for a given batch of pairs\ndef batch2TrainData(voc, pair_batch):\n    pair_batch.sort(key=lambda x: len(x[0].split(\" \")), reverse=True)\n    input_batch, output_batch = [], []\n    for pair in pair_batch:\n        input_batch.append(pair[0])\n        output_batch.append(pair[1])\n    inp, lengths = inputVar(input_batch, voc)\n    output, mask, max_target_len = outputVar(output_batch, voc)\n    return inp, lengths, output, mask, max_target_len\n\n\n# Example for validation\nsmall_batch_size = 5\nbatches = batch2TrainData(voc, [random.choice(pairs) for _ in range(small_batch_size)])\ninput_variable, lengths, target_variable, mask, max_target_len = batches\n\nprint(\"input_variable:\", input_variable)\nprint(\"lengths:\", lengths)\nprint(\"target_variable:\", target_variable)\nprint(\"mask:\", mask)\nprint(\"max_target_len:\", max_target_len)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Define Models\n-------------\n\nSeq2Seq Model\n~~~~~~~~~~~~~\n\nThe brains of our chatbot is a sequence-to-sequence (seq2seq) model. The\ngoal of a seq2seq model is to take a variable-length sequence as an\ninput, and return a variable-length sequence as an output using a\nfixed-sized model.\n\n`Sutskever et al. <https://arxiv.org/abs/1409.3215>`__ discovered that\nby using two separate recurrent neural nets together, we can accomplish\nthis task. One RNN acts as an **encoder**, which encodes a variable\nlength input sequence to a fixed-length context vector. In theory, this\ncontext vector (the final hidden layer of the RNN) will contain semantic\ninformation about the query sentence that is input to the bot. The\nsecond RNN is a **decoder**, which takes an input word and the context\nvector, and returns a guess for the next word in the sequence and a\nhidden state to use in the next iteration.\n\n.. figure:: /_static/img/chatbot/seq2seq_ts.png\n   :align: center\n   :alt: model\n\nImage source:\nhttps://jeddy92.github.io/JEddy92.github.io/ts_seq2seq_intro/\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Encoder\n~~~~~~~\n\nThe encoder RNN iterates through the input sentence one token\n(e.g.\u00a0word) at a time, at each time step outputting an \u201coutput\u201d vector\nand a \u201chidden state\u201d vector. The hidden state vector is then passed to\nthe next time step, while the output vector is recorded. The encoder\ntransforms the context it saw at each point in the sequence into a set\nof points in a high-dimensional space, which the decoder will use to\ngenerate a meaningful output for the given task.\n\nAt the heart of our encoder is a multi-layered Gated Recurrent Unit,\ninvented by `Cho et al. <https://arxiv.org/pdf/1406.1078v3.pdf>`__ in\n2014. We will use a bidirectional variant of the GRU, meaning that there\nare essentially two independent RNNs: one that is fed the input sequence\nin normal sequential order, and one that is fed the input sequence in\nreverse order. The outputs of each network are summed at each time step.\nUsing a bidirectional GRU will give us the advantage of encoding both\npast and future context.\n\nBidirectional RNN:\n\n.. figure:: /_static/img/chatbot/RNN-bidirectional.png\n   :width: 70%\n   :align: center\n   :alt: rnn_bidir\n\nImage source: https://colah.github.io/posts/2015-09-NN-Types-FP/\n\nNote that an ``embedding`` layer is used to encode our word indices in\nan arbitrarily sized feature space. For our models, this layer will map\neach word to a feature space of size *hidden_size*. When trained, these\nvalues should encode semantic similarity between similar meaning words.\n\nFinally, if passing a padded batch of sequences to an RNN module, we\nmust pack and unpack padding around the RNN pass using\n``nn.utils.rnn.pack_padded_sequence`` and\n``nn.utils.rnn.pad_packed_sequence`` respectively.\n\n**Computation Graph:**\n\n   1) Convert word indexes to embeddings.\n   2) Pack padded batch of sequences for RNN module.\n   3) Forward pass through GRU.\n   4) Unpack padding.\n   5) Sum bidirectional GRU outputs.\n   6) Return output and final hidden state.\n\n**Inputs:**\n\n-  ``input_seq``: batch of input sentences; shape=\\ *(max_length,\n   batch_size)*\n-  ``input_lengths``: list of sentence lengths corresponding to each\n   sentence in the batch; shape=\\ *(batch_size)*\n-  ``hidden``: hidden state; shape=\\ *(n_layers x num_directions,\n   batch_size, hidden_size)*\n\n**Outputs:**\n\n-  ``outputs``: output features from the last hidden layer of the GRU\n   (sum of bidirectional outputs); shape=\\ *(max_length, batch_size,\n   hidden_size)*\n-  ``hidden``: updated hidden state from GRU; shape=\\ *(n_layers x\n   num_directions, batch_size, hidden_size)*\n\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class EncoderRNN(nn.Module):\n    def __init__(self, hidden_size, embedding, n_layers=1, dropout=0):\n        super(EncoderRNN, self).__init__()\n        self.n_layers = n_layers\n        self.hidden_size = hidden_size\n        self.embedding = embedding\n\n        # Initialize GRU; the input_size and hidden_size params are both set to 'hidden_size'\n        #   because our input size is a word embedding with number of features == hidden_size\n        self.gru = nn.GRU(hidden_size, hidden_size, n_layers,\n                          dropout=(0 if n_layers == 1 else dropout), bidirectional=True)\n\n    def forward(self, input_seq, input_lengths, hidden=None):\n        # Convert word indexes to embeddings\n        embedded = self.embedding(input_seq)\n        # Pack padded batch of sequences for RNN module\n        packed = nn.utils.rnn.pack_padded_sequence(embedded, input_lengths)\n        # Forward pass through GRU\n        outputs, hidden = self.gru(packed, hidden)\n        # Unpack padding\n        outputs, _ = nn.utils.rnn.pad_packed_sequence(outputs)\n        # Sum bidirectional GRU outputs\n        outputs = outputs[:, :, :self.hidden_size] + outputs[:, : ,self.hidden_size:]\n        # Return output and final hidden state\n        return outputs, hidden"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Decoder\n~~~~~~~\n\nThe decoder RNN generates the response sentence in a token-by-token\nfashion. It uses the encoder\u2019s context vectors, and internal hidden\nstates to generate the next word in the sequence. It continues\ngenerating words until it outputs an *EOS_token*, representing the end\nof the sentence. A common problem with a vanilla seq2seq decoder is that\nif we rely soley on the context vector to encode the entire input\nsequence\u2019s meaning, it is likely that we will have information loss.\nThis is especially the case when dealing with long input sequences,\ngreatly limiting the capability of our decoder.\n\nTo combat this, `Bahdanau et al. <https://arxiv.org/abs/1409.0473>`__\ncreated an \u201cattention mechanism\u201d that allows the decoder to pay\nattention to certain parts of the input sequence, rather than using the\nentire fixed context at every step.\n\nAt a high level, attention is calculated using the decoder\u2019s current\nhidden state and the encoder\u2019s outputs. The output attention weights\nhave the same shape as the input sequence, allowing us to multiply them\nby the encoder outputs, giving us a weighted sum which indicates the\nparts of encoder output to pay attention to. `Sean\nRobertson\u2019s <https://github.com/spro>`__ figure describes this very\nwell:\n\n.. figure:: /_static/img/chatbot/attn2.png\n   :align: center\n   :alt: attn2\n\n`Luong et al. <https://arxiv.org/abs/1508.04025>`__ improved upon\nBahdanau et al.\u2019s groundwork by creating \u201cGlobal attention\u201d. The key\ndifference is that with \u201cGlobal attention\u201d, we consider all of the\nencoder\u2019s hidden states, as opposed to Bahdanau et al.\u2019s \u201cLocal\nattention\u201d, which only considers the encoder\u2019s hidden state from the\ncurrent time step. Another difference is that with \u201cGlobal attention\u201d,\nwe calculate attention weights, or energies, using the hidden state of\nthe decoder from the current time step only. Bahdanau et al.\u2019s attention\ncalculation requires knowledge of the decoder\u2019s state from the previous\ntime step. Also, Luong et al.\u00a0provides various methods to calculate the\nattention energies between the encoder output and decoder output which\nare called \u201cscore functions\u201d:\n\n.. figure:: /_static/img/chatbot/scores.png\n   :width: 60%\n   :align: center\n   :alt: scores\n\nwhere $h_t$ = current target decoder state and $\\bar{h}_s$ =\nall encoder states.\n\nOverall, the Global attention mechanism can be summarized by the\nfollowing figure. Note that we will implement the \u201cAttention Layer\u201d as a\nseparate ``nn.Module`` called ``Attn``. The output of this module is a\nsoftmax normalized weights tensor of shape *(batch_size, 1,\nmax_length)*.\n\n.. figure:: /_static/img/chatbot/global_attn.png\n   :align: center\n   :width: 60%\n   :alt: global_attn\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Luong attention layer\nclass Attn(nn.Module):\n    def __init__(self, method, hidden_size):\n        super(Attn, self).__init__()\n        self.method = method\n        if self.method not in ['dot', 'general', 'concat']:\n            raise ValueError(self.method, \"is not an appropriate attention method.\")\n        self.hidden_size = hidden_size\n        if self.method == 'general':\n            self.attn = nn.Linear(self.hidden_size, hidden_size)\n        elif self.method == 'concat':\n            self.attn = nn.Linear(self.hidden_size * 2, hidden_size)\n            self.v = nn.Parameter(torch.FloatTensor(hidden_size))\n\n    def dot_score(self, hidden, encoder_output):\n        return torch.sum(hidden * encoder_output, dim=2)\n\n    def general_score(self, hidden, encoder_output):\n        energy = self.attn(encoder_output)\n        return torch.sum(hidden * energy, dim=2)\n\n    def concat_score(self, hidden, encoder_output):\n        energy = self.attn(torch.cat((hidden.expand(encoder_output.size(0), -1, -1), encoder_output), 2)).tanh()\n        return torch.sum(self.v * energy, dim=2)\n\n    def forward(self, hidden, encoder_outputs):\n        # Calculate the attention weights (energies) based on the given method\n        if self.method == 'general':\n            attn_energies = self.general_score(hidden, encoder_outputs)\n        elif self.method == 'concat':\n            attn_energies = self.concat_score(hidden, encoder_outputs)\n        elif self.method == 'dot':\n            attn_energies = self.dot_score(hidden, encoder_outputs)\n\n        # Transpose max_length and batch_size dimensions\n        attn_energies = attn_energies.t()\n\n        # Return the softmax normalized probability scores (with added dimension)\n        return F.softmax(attn_energies, dim=1).unsqueeze(1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now that we have defined our attention submodule, we can implement the\nactual decoder model. For the decoder, we will manually feed our batch\none time step at a time. This means that our embedded word tensor and\nGRU output will both have shape *(1, batch_size, hidden_size)*.\n\n**Computation Graph:**\n\n   1) Get embedding of current input word.\n   2) Forward through unidirectional GRU.\n   3) Calculate attention weights from the current GRU output from (2).\n   4) Multiply attention weights to encoder outputs to get new \"weighted sum\" context vector.\n   5) Concatenate weighted context vector and GRU output using Luong eq. 5.\n   6) Predict next word using Luong eq. 6 (without softmax).\n   7) Return output and final hidden state.\n\n**Inputs:**\n\n-  ``input_step``: one time step (one word) of input sequence batch;\n   shape=\\ *(1, batch_size)*\n-  ``last_hidden``: final hidden layer of GRU; shape=\\ *(n_layers x\n   num_directions, batch_size, hidden_size)*\n-  ``encoder_outputs``: encoder model\u2019s output; shape=\\ *(max_length,\n   batch_size, hidden_size)*\n\n**Outputs:**\n\n-  ``output``: softmax normalized tensor giving probabilities of each\n   word being the correct next word in the decoded sequence;\n   shape=\\ *(batch_size, voc.num_words)*\n-  ``hidden``: final hidden state of GRU; shape=\\ *(n_layers x\n   num_directions, batch_size, hidden_size)*\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class LuongAttnDecoderRNN(nn.Module):\n    def __init__(self, attn_model, embedding, hidden_size, output_size, n_layers=1, dropout=0.1):\n        super(LuongAttnDecoderRNN, self).__init__()\n\n        # Keep for reference\n        self.attn_model = attn_model\n        self.hidden_size = hidden_size\n        self.output_size = output_size\n        self.n_layers = n_layers\n        self.dropout = dropout\n\n        # Define layers\n        self.embedding = embedding\n        self.embedding_dropout = nn.Dropout(dropout)\n        self.gru = nn.GRU(hidden_size, hidden_size, n_layers, dropout=(0 if n_layers == 1 else dropout))\n        self.concat = nn.Linear(hidden_size * 2, hidden_size)\n        self.out = nn.Linear(hidden_size, output_size)\n\n        self.attn = Attn(attn_model, hidden_size)\n\n    def forward(self, input_step, last_hidden, encoder_outputs):\n        # Note: we run this one step (word) at a time\n        # Get embedding of current input word\n        embedded = self.embedding(input_step)\n        embedded = self.embedding_dropout(embedded)\n        # Forward through unidirectional GRU\n        rnn_output, hidden = self.gru(embedded, last_hidden)\n        # Calculate attention weights from the current GRU output\n        attn_weights = self.attn(rnn_output, encoder_outputs)\n        # Multiply attention weights to encoder outputs to get new \"weighted sum\" context vector\n        context = attn_weights.bmm(encoder_outputs.transpose(0, 1))\n        # Concatenate weighted context vector and GRU output using Luong eq. 5\n        rnn_output = rnn_output.squeeze(0)\n        context = context.squeeze(1)\n        concat_input = torch.cat((rnn_output, context), 1)\n        concat_output = torch.tanh(self.concat(concat_input))\n        # Predict next word using Luong eq. 6\n        output = self.out(concat_output)\n        output = F.softmax(output, dim=1)\n        # Return output and final hidden state\n        return output, hidden"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Define Training Procedure\n-------------------------\n\nMasked loss\n~~~~~~~~~~~\n\nSince we are dealing with batches of padded sequences, we cannot simply\nconsider all elements of the tensor when calculating loss. We define\n``maskNLLLoss`` to calculate our loss based on our decoder\u2019s output\ntensor, the target tensor, and a binary mask tensor describing the\npadding of the target tensor. This loss function calculates the average\nnegative log likelihood of the elements that correspond to a *1* in the\nmask tensor.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def maskNLLLoss(inp, target, mask):\n    nTotal = mask.sum()\n    crossEntropy = -torch.log(torch.gather(inp, 1, target.view(-1, 1)).squeeze(1))\n    loss = crossEntropy.masked_select(mask).mean()\n    loss = loss.to(device)\n    return loss, nTotal.item()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Single training iteration\n~~~~~~~~~~~~~~~~~~~~~~~~~\n\nThe ``train`` function contains the algorithm for a single training\niteration (a single batch of inputs).\n\nWe will use a couple of clever tricks to aid in convergence:\n\n-  The first trick is using **teacher forcing**. This means that at some\n   probability, set by ``teacher_forcing_ratio``, we use the current\n   target word as the decoder\u2019s next input rather than using the\n   decoder\u2019s current guess. This technique acts as training wheels for\n   the decoder, aiding in more efficient training. However, teacher\n   forcing can lead to model instability during inference, as the\n   decoder may not have a sufficient chance to truly craft its own\n   output sequences during training. Thus, we must be mindful of how we\n   are setting the ``teacher_forcing_ratio``, and not be fooled by fast\n   convergence.\n\n-  The second trick that we implement is **gradient clipping**. This is\n   a commonly used technique for countering the \u201cexploding gradient\u201d\n   problem. In essence, by clipping or thresholding gradients to a\n   maximum value, we prevent the gradients from growing exponentially\n   and either overflow (NaN), or overshoot steep cliffs in the cost\n   function.\n\n.. figure:: /_static/img/chatbot/grad_clip.png\n   :align: center\n   :width: 60%\n   :alt: grad_clip\n\nImage source: Goodfellow et al. *Deep Learning*. 2016. https://www.deeplearningbook.org/\n\n**Sequence of Operations:**\n\n   1) Forward pass entire input batch through encoder.\n   2) Initialize decoder inputs as SOS_token, and hidden state as the encoder's final hidden state.\n   3) Forward input batch sequence through decoder one time step at a time.\n   4) If teacher forcing: set next decoder input as the current target; else: set next decoder input as current decoder output.\n   5) Calculate and accumulate loss.\n   6) Perform backpropagation.\n   7) Clip gradients.\n   8) Update encoder and decoder model parameters.\n\n\n.. Note ::\n\n  PyTorch\u2019s RNN modules (``RNN``, ``LSTM``, ``GRU``) can be used like any\n  other non-recurrent layers by simply passing them the entire input\n  sequence (or batch of sequences). We use the ``GRU`` layer like this in\n  the ``encoder``. The reality is that under the hood, there is an\n  iterative process looping over each time step calculating hidden states.\n  Alternatively, you ran run these modules one time-step at a time. In\n  this case, we manually loop over the sequences during the training\n  process like we must do for the ``decoder`` model. As long as you\n  maintain the correct conceptual model of these modules, implementing\n  sequential models can be very straightforward.\n\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def train(input_variable, lengths, target_variable, mask, max_target_len, encoder, decoder, embedding,\n          encoder_optimizer, decoder_optimizer, batch_size, clip, max_length=MAX_LENGTH):\n\n    # Zero gradients\n    encoder_optimizer.zero_grad()\n    decoder_optimizer.zero_grad()\n\n    # Set device options\n    input_variable = input_variable.to(device)\n    lengths = lengths.to(device)\n    target_variable = target_variable.to(device)\n    mask = mask.to(device)\n\n    # Initialize variables\n    loss = 0\n    print_losses = []\n    n_totals = 0\n\n    # Forward pass through encoder\n    encoder_outputs, encoder_hidden = encoder(input_variable, lengths)\n\n    # Create initial decoder input (start with SOS tokens for each sentence)\n    decoder_input = torch.LongTensor([[SOS_token for _ in range(batch_size)]])\n    decoder_input = decoder_input.to(device)\n\n    # Set initial decoder hidden state to the encoder's final hidden state\n    decoder_hidden = encoder_hidden[:decoder.n_layers]\n\n    # Determine if we are using teacher forcing this iteration\n    use_teacher_forcing = True if random.random() < teacher_forcing_ratio else False\n\n    # Forward batch of sequences through decoder one time step at a time\n    if use_teacher_forcing:\n        for t in range(max_target_len):\n            decoder_output, decoder_hidden = decoder(\n                decoder_input, decoder_hidden, encoder_outputs\n            )\n            # Teacher forcing: next input is current target\n            decoder_input = target_variable[t].view(1, -1)\n            # Calculate and accumulate loss\n            mask_loss, nTotal = maskNLLLoss(decoder_output, target_variable[t], mask[t])\n            loss += mask_loss\n            print_losses.append(mask_loss.item() * nTotal)\n            n_totals += nTotal\n    else:\n        for t in range(max_target_len):\n            decoder_output, decoder_hidden = decoder(\n                decoder_input, decoder_hidden, encoder_outputs\n            )\n            # No teacher forcing: next input is decoder's own current output\n            _, topi = decoder_output.topk(1)\n            decoder_input = torch.LongTensor([[topi[i][0] for i in range(batch_size)]])\n            decoder_input = decoder_input.to(device)\n            # Calculate and accumulate loss\n            mask_loss, nTotal = maskNLLLoss(decoder_output, target_variable[t], mask[t])\n            loss += mask_loss\n            print_losses.append(mask_loss.item() * nTotal)\n            n_totals += nTotal\n\n    # Perform backpropatation\n    loss.backward()\n\n    # Clip gradients: gradients are modified in place\n    _ = nn.utils.clip_grad_norm_(encoder.parameters(), clip)\n    _ = nn.utils.clip_grad_norm_(decoder.parameters(), clip)\n\n    # Adjust model weights\n    encoder_optimizer.step()\n    decoder_optimizer.step()\n\n    return sum(print_losses) / n_totals"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Training iterations\n~~~~~~~~~~~~~~~~~~~\n\nIt is finally time to tie the full training procedure together with the\ndata. The ``trainIters`` function is responsible for running\n``n_iterations`` of training given the passed models, optimizers, data,\netc. This function is quite self explanatory, as we have done the heavy\nlifting with the ``train`` function.\n\nOne thing to note is that when we save our model, we save a tarball\ncontaining the encoder and decoder state_dicts (parameters), the\noptimizers\u2019 state_dicts, the loss, the iteration, etc. Saving the model\nin this way will give us the ultimate flexibility with the checkpoint.\nAfter loading a checkpoint, we will be able to use the model parameters\nto run inference, or we can continue training right where we left off.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def trainIters(model_name, voc, pairs, encoder, decoder, encoder_optimizer, decoder_optimizer, embedding, encoder_n_layers, decoder_n_layers, save_dir, n_iteration, batch_size, print_every, save_every, clip, corpus_name, loadFilename):\n\n    # Load batches for each iteration\n    training_batches = [batch2TrainData(voc, [random.choice(pairs) for _ in range(batch_size)])\n                      for _ in range(n_iteration)]\n\n    # Initializations\n    print('Initializing ...')\n    start_iteration = 1\n    print_loss = 0\n    if loadFilename:\n        start_iteration = checkpoint['iteration'] + 1\n\n    # Training loop\n    print(\"Training...\")\n    for iteration in range(start_iteration, n_iteration + 1):\n        training_batch = training_batches[iteration - 1]\n        # Extract fields from batch\n        input_variable, lengths, target_variable, mask, max_target_len = training_batch\n\n        # Run a training iteration with batch\n        loss = train(input_variable, lengths, target_variable, mask, max_target_len, encoder,\n                     decoder, embedding, encoder_optimizer, decoder_optimizer, batch_size, clip)\n        print_loss += loss\n\n        # Print progress\n        if iteration % print_every == 0:\n            print_loss_avg = print_loss / print_every\n            print(\"Iteration: {}; Percent complete: {:.1f}%; Average loss: {:.4f}\".format(iteration, iteration / n_iteration * 100, print_loss_avg))\n            print_loss = 0\n\n        # Save checkpoint\n        if (iteration % save_every == 0):\n            directory = os.path.join(save_dir, model_name, corpus_name, '{}-{}_{}'.format(encoder_n_layers, decoder_n_layers, hidden_size))\n            if not os.path.exists(directory):\n                os.makedirs(directory)\n            torch.save({\n                'iteration': iteration,\n                'en': encoder.state_dict(),\n                'de': decoder.state_dict(),\n                'en_opt': encoder_optimizer.state_dict(),\n                'de_opt': decoder_optimizer.state_dict(),\n                'loss': loss,\n                'voc_dict': voc.__dict__,\n                'embedding': embedding.state_dict()\n            }, os.path.join(directory, '{}_{}.tar'.format(iteration, 'checkpoint')))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Define Evaluation\n-----------------\n\nAfter training a model, we want to be able to talk to the bot ourselves.\nFirst, we must define how we want the model to decode the encoded input.\n\nGreedy decoding\n~~~~~~~~~~~~~~~\n\nGreedy decoding is the decoding method that we use during training when\nwe are **NOT** using teacher forcing. In other words, for each time\nstep, we simply choose the word from ``decoder_output`` with the highest\nsoftmax value. This decoding method is optimal on a single time-step\nlevel.\n\nTo facilite the greedy decoding operation, we define a\n``GreedySearchDecoder`` class. When run, an object of this class takes\nan input sequence (``input_seq``) of shape *(input_seq length, 1)*, a\nscalar input length (``input_length``) tensor, and a ``max_length`` to\nbound the response sentence length. The input sentence is evaluated\nusing the following computational graph:\n\n**Computation Graph:**\n\n   1) Forward input through encoder model.\n   2) Prepare encoder's final hidden layer to be first hidden input to the decoder.\n   3) Initialize decoder's first input as SOS_token.\n   4) Initialize tensors to append decoded words to.\n   5) Iteratively decode one word token at a time:\n       a) Forward pass through decoder.\n       b) Obtain most likely word token and its softmax score.\n       c) Record token and score.\n       d) Prepare current token to be next decoder input.\n   6) Return collections of word tokens and scores.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "class GreedySearchDecoder(nn.Module):\n    def __init__(self, encoder, decoder):\n        super(GreedySearchDecoder, self).__init__()\n        self.encoder = encoder\n        self.decoder = decoder\n\n    def forward(self, input_seq, input_length, max_length):\n        # Forward input through encoder model\n        encoder_outputs, encoder_hidden = self.encoder(input_seq, input_length)\n        # Prepare encoder's final hidden layer to be first hidden input to the decoder\n        decoder_hidden = encoder_hidden[:decoder.n_layers]\n        # Initialize decoder input with SOS_token\n        decoder_input = torch.ones(1, 1, device=device, dtype=torch.long) * SOS_token\n        # Initialize tensors to append decoded words to\n        all_tokens = torch.zeros([0], device=device, dtype=torch.long)\n        all_scores = torch.zeros([0], device=device)\n        # Iteratively decode one word token at a time\n        for _ in range(max_length):\n            # Forward pass through decoder\n            decoder_output, decoder_hidden = self.decoder(decoder_input, decoder_hidden, encoder_outputs)\n            # Obtain most likely word token and its softmax score\n            decoder_scores, decoder_input = torch.max(decoder_output, dim=1)\n            # Record token and score\n            all_tokens = torch.cat((all_tokens, decoder_input), dim=0)\n            all_scores = torch.cat((all_scores, decoder_scores), dim=0)\n            # Prepare current token to be next decoder input (add a dimension)\n            decoder_input = torch.unsqueeze(decoder_input, 0)\n        # Return collections of word tokens and scores\n        return all_tokens, all_scores"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Evaluate my text\n~~~~~~~~~~~~~~~~\n\nNow that we have our decoding method defined, we can write functions for\nevaluating a string input sentence. The ``evaluate`` function manages\nthe low-level process of handling the input sentence. We first format\nthe sentence as an input batch of word indexes with *batch_size==1*. We\ndo this by converting the words of the sentence to their corresponding\nindexes, and transposing the dimensions to prepare the tensor for our\nmodels. We also create a ``lengths`` tensor which contains the length of\nour input sentence. In this case, ``lengths`` is scalar because we are\nonly evaluating one sentence at a time (batch_size==1). Next, we obtain\nthe decoded response sentence tensor using our ``GreedySearchDecoder``\nobject (``searcher``). Finally, we convert the response\u2019s indexes to\nwords and return the list of decoded words.\n\n``evaluateInput`` acts as the user interface for our chatbot. When\ncalled, an input text field will spawn in which we can enter our query\nsentence. After typing our input sentence and pressing *Enter*, our text\nis normalized in the same way as our training data, and is ultimately\nfed to the ``evaluate`` function to obtain a decoded output sentence. We\nloop this process, so we can keep chatting with our bot until we enter\neither \u201cq\u201d or \u201cquit\u201d.\n\nFinally, if a sentence is entered that contains a word that is not in\nthe vocabulary, we handle this gracefully by printing an error message\nand prompting the user to enter another sentence.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def evaluate(encoder, decoder, searcher, voc, sentence, max_length=MAX_LENGTH):\n    ### Format input sentence as a batch\n    # words -> indexes\n    indexes_batch = [indexesFromSentence(voc, sentence)]\n    # Create lengths tensor\n    lengths = torch.tensor([len(indexes) for indexes in indexes_batch])\n    # Transpose dimensions of batch to match models' expectations\n    input_batch = torch.LongTensor(indexes_batch).transpose(0, 1)\n    # Use appropriate device\n    input_batch = input_batch.to(device)\n    lengths = lengths.to(device)\n    # Decode sentence with searcher\n    tokens, scores = searcher(input_batch, lengths, max_length)\n    # indexes -> words\n    decoded_words = [voc.index2word[token.item()] for token in tokens]\n    return decoded_words\n\n\ndef evaluateInput(encoder, decoder, searcher, voc):\n    input_sentence = ''\n    while(1):\n        try:\n            # Get input sentence\n            input_sentence = input('> ')\n            # Check if it is quit case\n            if input_sentence == 'q' or input_sentence == 'quit': break\n            # Normalize sentence\n            input_sentence = normalizeString(input_sentence)\n            # Evaluate sentence\n            output_words = evaluate(encoder, decoder, searcher, voc, input_sentence)\n            # Format and print response sentence\n            output_words[:] = [x for x in output_words if not (x == 'EOS' or x == 'PAD')]\n            print('Bot:', ' '.join(output_words))\n\n        except KeyError:\n            print(\"Error: Encountered unknown word.\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run Model\n---------\n\nFinally, it is time to run our model!\n\nRegardless of whether we want to train or test the chatbot model, we\nmust initialize the individual encoder and decoder models. In the\nfollowing block, we set our desired configurations, choose to start from\nscratch or set a checkpoint to load from, and build and initialize the\nmodels. Feel free to play with different model configurations to\noptimize performance.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Configure models\nmodel_name = 'cb_model'\nattn_model = 'dot'\n#attn_model = 'general'\n#attn_model = 'concat'\nhidden_size = 500\nencoder_n_layers = 2\ndecoder_n_layers = 2\ndropout = 0.1\nbatch_size = 64\n\n# Set checkpoint to load from; set to None if starting from scratch\nloadFilename = None\ncheckpoint_iter = 4000\n#loadFilename = os.path.join(save_dir, model_name, corpus_name,\n#                            '{}-{}_{}'.format(encoder_n_layers, decoder_n_layers, hidden_size),\n#                            '{}_checkpoint.tar'.format(checkpoint_iter))\n\n\n# Load model if a loadFilename is provided\nif loadFilename:\n    # If loading on same machine the model was trained on\n    checkpoint = torch.load(loadFilename)\n    # If loading a model trained on GPU to CPU\n    #checkpoint = torch.load(loadFilename, map_location=torch.device('cpu'))\n    encoder_sd = checkpoint['en']\n    decoder_sd = checkpoint['de']\n    encoder_optimizer_sd = checkpoint['en_opt']\n    decoder_optimizer_sd = checkpoint['de_opt']\n    embedding_sd = checkpoint['embedding']\n    voc.__dict__ = checkpoint['voc_dict']\n\n\nprint('Building encoder and decoder ...')\n# Initialize word embeddings\nembedding = nn.Embedding(voc.num_words, hidden_size)\nif loadFilename:\n    embedding.load_state_dict(embedding_sd)\n# Initialize encoder & decoder models\nencoder = EncoderRNN(hidden_size, embedding, encoder_n_layers, dropout)\ndecoder = LuongAttnDecoderRNN(attn_model, embedding, hidden_size, voc.num_words, decoder_n_layers, dropout)\nif loadFilename:\n    encoder.load_state_dict(encoder_sd)\n    decoder.load_state_dict(decoder_sd)\n# Use appropriate device\nencoder = encoder.to(device)\ndecoder = decoder.to(device)\nprint('Models built and ready to go!')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run Training\n~~~~~~~~~~~~\n\nRun the following block if you want to train the model.\n\nFirst we set training parameters, then we initialize our optimizers, and\nfinally we call the ``trainIters`` function to run our training\niterations.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Configure training/optimization\nclip = 50.0\nteacher_forcing_ratio = 1.0\nlearning_rate = 0.0001\ndecoder_learning_ratio = 5.0\nn_iteration = 4000\nprint_every = 1\nsave_every = 500\n\n# Ensure dropout layers are in train mode\nencoder.train()\ndecoder.train()\n\n# Initialize optimizers\nprint('Building optimizers ...')\nencoder_optimizer = optim.Adam(encoder.parameters(), lr=learning_rate)\ndecoder_optimizer = optim.Adam(decoder.parameters(), lr=learning_rate * decoder_learning_ratio)\nif loadFilename:\n    encoder_optimizer.load_state_dict(encoder_optimizer_sd)\n    decoder_optimizer.load_state_dict(decoder_optimizer_sd)\n\n# Run training iterations\nprint(\"Starting Training!\")\ntrainIters(model_name, voc, pairs, encoder, decoder, encoder_optimizer, decoder_optimizer,\n           embedding, encoder_n_layers, decoder_n_layers, save_dir, n_iteration, batch_size,\n           print_every, save_every, clip, corpus_name, loadFilename)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Run Evaluation\n~~~~~~~~~~~~~~\n\nTo chat with your model, run the following block.\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Set dropout layers to eval mode\nencoder.eval()\ndecoder.eval()\n\n# Initialize search module\nsearcher = GreedySearchDecoder(encoder, decoder)\n\n# Begin chatting (uncomment and run the following line to begin)\n# evaluateInput(encoder, decoder, searcher, voc)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Conclusion\n----------\n\nThat\u2019s all for this one, folks. Congratulations, you now know the\nfundamentals to building a generative chatbot model! If you\u2019re\ninterested, you can try tailoring the chatbot\u2019s behavior by tweaking the\nmodel and training parameters and customizing the data that you train\nthe model on.\n\nCheck out the other tutorials for more cool deep learning applications\nin PyTorch!\n\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.6.8"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}